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Abstract. We describe an application of the MultiNest algorithm to gravitational wave 
data analysis. MULTINEST is a multimodal nested sampling algorithm designed to efficiently 
evaluate the Bayesian evidence and return posterior probability densities for likelihood 
surfaces containing multiple secondary modes. The algorithm employs a set of 'live' points 
which are updated by partitioning the set into multiple overlapping ellipsoids and sampling 
uniformly from within them. This set of 'live' points climbs up the likelihood surface through 
nested iso-likelihood contours and the evidence and posterior distributions can be recovered 
from the point set evolution. The algorithm is model-independent in the sense that the specific 
problem being tackled enters only through the likelihood computation, and does not change 
how the 'live' point set is updated. In this paper, we consider the use of the algorithm for 
gravitational wave data analysis by searching a simulated LISA data set containing two non- 
spitming supermassive black hole binary signals. The algorithm is able to rapidly identify all 
the modes of the solution and recover the true parameters of the sources to high precision. 
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1. Introduction 

There is currently much active research into data analysis algorithms for gravitational wave 
detectors, both on the ground and for the proposed space-based gravitational wave detector, 
the Laser Interferometer Space Antenna (LISA) [IJ. Research into LISA data analysis is 
being encouraged by the Mock LISA Data Challenges Q (MLDC), which so far have 
included data sets containing individual and multiple white-dwarf binaries, a realisation of the 
whole galaxy of compact binaries, single and multiple non-spinning supermassive black hole 
(SMBH) binaries, either isolated or on top of a galactic confusion background and isolated 
extreme-mass-ratio inspiral (EMRI) sources in purely instrumental noise. While most of 
these Challenges have been solved in the sense that several groups returned solutions that 
matched the injected parameters, the EMRI case caused particular difficulty |[3]|4l|5]|6l. These 
problems arose primarily because the likelihood surface contains many secondary maxima, 
and so algorithms such as Markov Chain Monte Carlo (MCMC) tended to become stuck 
on secondary maxima and were unable to find the primary mode. In Challenge 3 of the 
MLDC, two new sources were introduced that have similar multi -modal likelihood surfaces — 
spinning black hole binaries and cosmic string cusps |7J. The final analysis of the LISA data 
will also have to contend with the presence of multiple overlapping sources simultaneously 
present in the data stream. For these reasons, it is desirable to have algorithms that are able 
to simultaneously identify and characterize all the modes of the likelihood surface. One 
approach is to use an evolutionary algorithm, and a recent implementation of these ideas 
is described in |8|. However, there are also existing tools that have been developed for other 
applications that have the necessary features to be useful for the gravitational wave case. One 
such algorithm is MultiNest ll9l lT0l . which is a multi-modal nested sampling algorithm. 

The nested sampling algorithm fTTl was developed as a tool for evaluating the Bayesian 
evidence. It employs a set of 'live' points, each of which represents a particular set of 
parameters in the multi-dimensional search space. These points move as the algorithm 
progresses, and climb together through nested contours of increasing likelihood. At each step 
the algorithm works by finding a point of higher likelihood than the lowest likelihood point in 
the 'live' point set and then replacing the lowest likelihood point with the new point. Nested 
sampling has been applied to the issue of model selection for ground based observations of 
gravitational waves [12] and forms part of the evolutionary algorithm for LISA data analysis 
discussed in ISj. However, the primary difficulty in using nested sampling is to efficiently 
sample points of higher likelihood to allow the points to climb up the likelihood surface. 

MultiNest solves the problem of efficient sampling by using the current set of 'live' 
points as a model of the shape of the likelihood surface. The idea is to decompose the 
set of 'live' points into a sequence of overlapping ellipsoids, and then sample from within 
one of these ellipsoids which is chosen at random from the current set. The algorithm is 
model-free in the sense that the evolution of the 'live' point set does not make use of any 
knowledge about the properties of the likelihood surface. The only model-specific element 
is the subroutine which computes the likelihood at a given point in parameter space. This 
approach reduces much of the computational overhead associated with evaluating Fisher 
Matrices etc. in order to move the 'live' points on the likelihood surface. MultiNest 
has proven to be able to efficiently find and separate modes of a likelihood surface in many 
different applications llT3l 174111311161 . This suggests it may be a powerful tool for gravitational 
wave data analysis as well, and in this paper we explore these possibilities using a search for 
non-spinning SMBH binaries as a test case. Such sources are known to possess a degeneracy 
in the likelihood, since at low frequency the response of the detector to the true sky position 
and the point antipodal to it on the sky can not be distinguished. We can therefore use these 
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sources as a test case to investigate the ability of MultiNest to explore a multi-modal 
likelihood in the context of gravitational wave detection. 

The coalescences of supermassive black holes (SMBHs) in binaries are expected to be 
a major source of gravitational waves (GW) for LISA, and we should be able to detect such 
systems out to a redshift of z 10 with intrinsic parameter errors of a fraction of a percent 
to a few percent and extrinsic parameter errors of a few to a few tens of percent ifTTl [181 . 
The detection problem for non-spinning SMBH binaries has already been solved, in the sense 
that several groups have been able to successfully detect and recover parameters for such 
systems both in controlled studies ifTSl [T9l and in blind tests in the context of the MLDC. 
Several algorithms have been proven to be successful — Metropolis Hastings Monte Carlo 
(MHMC) L18iil9J . template-bank based searches 11201 1211 . time-frequency methods |22| and 
hierarchical searches mixing time-frequency and MHMC techniques ||231 . More recently, 
the evolutionary algorithm has also been shown to be effective in searches for non-spinning 
SMBH binaries |8 1. Thus, although the application of MultiNest to these systems will be 
an interesting test case, it is not essential to the success of LISA. However, the real power of 
the algorithm will be in its application to multi-modal likelihood surfaces for systems such 
as EMRIs. The present work is an indication of the power of the technique and a stimulus 
for further research on this algorithm in the context of gravitational wave data analysis. In 
addition, the evidence values returned by MultiNest can be used for model selection, 
which will also be important for LISA. Bayesian model selection using MCMC techniques 
has already been explored in the LISA context 1241 for the case of white-dwarf binaries. 
MultiNest provides an alternative approach to addressing similar questions, although we 
will not discuss that application of the algorithm in the present paper. 

The paper is organised as follows. In section[2]we give a brief introduction to Bayesian 
inference and then give details of the MultiNest algorithm in section [3] including an 
overview of nested sampling and a description of the ellipsoidal sampling scheme. In section]?] 
we briefly describe the waveform and noise models used in this analysis. In section ]5] we 
describe an application of the algorithm to a simultaneous search for two non-spinning black 
hole binaries in a single data set. We provide results on the parameter recovery achieved by 
the algorithm and compare these to the one sigma error estimates from the Fisher Matrix. We 
finish in section ]6] with a summary and a discussion of possible future applications of this 
work. 



2. Bayesian Inference 

Our detection methodology is built upon the principles of Bayesian inference, and so we 
begin by giving a brief summary of this framework. Bayesian inference methods provide a 
consistent approach to the estimation of a set of parameters in a model (or hypothesis) H 
for the data D. Bayes' theorem states that 

where Pr(0|D,iJ) = P{&) is the posterior probability distribution of the parameters, 
Pr(D|0,i/) = £(0) is the likelihood, Fr{&\H) = 7r(0) is the prior distribution, and 
Pr(D|7J) = Z is the Bayesian evidence. 

Bayesian evidence is simply the factor required to normalise the posterior over and is 
given by: 

£(0)7r(0)d^0, (2) 
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where N is the dimensionahty of the parameter space. Since the Bayesian evidence is 
independent of the parameter values 0, it is usually ignored in parameter estimation problems 
and the posterior inferences are obtained by exploring the un-normalized posterior using 
standard MCMC sampling methods. 

Bayesian parameter estimation has been used quite extensively in a variety of 
astronomical applications, including gravitational wave astronomy, although standard MCMC 
methods, such as the basic Metropolis-Hastings algorithm or the Hamiltonian sampling 
technique (see e.g. |27|), can experience problems in sampling efficiently from a multi- 
modal posterior distribution or one with large (curving) degeneracies between parameters. 
Moreover, MCMC methods often require careful tuning of the proposal distribution to sample 
efficiently, and testing for convergence can be problematic. 

In order to select between two models Hq and Hi one needs to compare their respective 
posterior probabilities given the observed data set D, as follows: 

Pr(gi|D) ^ Pr(D|gi)Pr(gi) ^ Zi Pr(gi) 

Pr(i/o|D) Pr(D|i/o)Pr(ifo) Zq Pr(i/o) ' 

where Pr(i/i)/ Pr(i7o) is the prior probability ratio for the two models, which can often 
be set to unity but occasionally requires further consideration (see, for example, llT3l [T4l for 
the cases where the prior probability ratios should not be set to unity). It can be seen from 
Eq. (j3]l that the Bayesian evidence plays a central role in Bayesian model selection. As the 
average of likelihood over the prior, the evidence automatically implements Occam's razor: 
a simpler theory which agrees well enough with the empirical evidence is preferred. A more 
complicated theory will only have a higher evidence if it is significantly better at explaining 
the data than a simpler theory. 

Unfortunately, evaluation of Bayesian evidence involves the multidimensional integral 
(Eq. Q) and thus presents a challenging numerical task. Standard techniques like 
thermodynamic integration 1281 1241 are extremely computationally expensive which makes 
evidence evaluation typically at least an order of magnitude more costly than parameter 
estimation. Some fast approximate methods have been used for evidence evaluation, such 
as treating the posterior as a multivariate Gaussian centred at its peak (see, for example, 
pSi), but this approximation is clearly a poor one for highly non-Gaussian and multi-modal 
posteriors. Various alternative information criteria for model selection are discussed in Ii26i . 
but the evidence remains the preferred method. 



3. Nested Sampling and the MultiNest Algorithm 

Nested sampling [TT] is a Monte Carlo method targetted at the efficient calculation of the 
evidence, but also produces posterior inferences as a by-product. It calculates the evidence 
by transforming the multi-dimensional evidence integral into a one-dimensional integral that 
is easy to evaluate numerically. This is accomplished by defining the prior volume X as 
dX = TT{&)d^&, so that 

X{X) = f 7T{&)d^&, (4) 

Jc(&)>\ 

where the integral extends over the region(s) of parameter space contained within the iso- 
likelihood contour £(0) — A. The evidence integral, Eq. (j2]|, can then be written as 

Z = [ C{X)dX, (5) 




where C{X), the inverse of Eq. Q, is a monotonically decreasing function of X. Thus, if one 
can evaluate the Hkehhoods Ci = C{Xi), where Xi is a sequence of decreasing values, 

{)< Xm < ■ ■ ■ < X2 < Xi < Xq ^ I, (6) 
as shown schematically in Fig. [T] the evidence can be approximated numerically using 
standard quadrature methods as a weighted sum 

M 

Z = ^dwi, (7) 

1=1 

where the weights Wi for the simple trapezium rule are given by Wi = ^{Xi^i — Xj+i). An 
example of a posterior in two dimensions and its associated function C{X) is shown in Fig.[r| 

3.1. Evidence Evaluation 

In order to evaluate the evidence value given in (Eq. [7]), a set of N 'live' points are drawn 
uniformly from the full prior 7r(0) with the initial prior volume Xq set to 1. The likelihood 
value for each of the N 'live' points is calculated and the point with the minimum likelihood 
value £0 is removed from the 'live' point set and is replaced by another point sampled 
uniformly from the prior with likelihood C > Cq. This results in the reduction in the prior 
volume within the iso-likelihood contour by a factor ti i.e. Xi — tiXo where ti follows the 
distribution of the largest of the N samples drawn uniformly from the interval [0, 1] which is 
given by Pr(i) — Nt'^^^. This procedure is repeated at each subsequent iteration i, at which 
the point with the lowest likelihood value Ci is replaced by a new point sampled uniformly 
from the prior with C > Ci until the entire prior volume has been traversed. The expected 
value and the standard deviation of log t, which dominates the geometrical exploration is 
given by: 

E [log t] = -l/N, a [log t]^l/N. (8) 

With the values of log t at different iterations being independent of each other, the prior 
volume at iteration i is expected to be: 

logX,^-{i±Vi)/N. (9) 

Thus, one takes 

X, = exp{-i/N). (10) 
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3.2. Stopping Criterion 

The primary aim of nested sampling algorithm is to calculate the evidence value and so it 
should be terminated when the evidence value has been calculated to required accuracy. One 
way to achieve this is by checking whether the change in the evidence value from one iteration 
to the other is smaller than a specified tolerance but in cases where the posterior contains 
narrow peaks close to its maximum, this can result in an underestimated evidence value. 

Skilling [11 1 provides a robust stopping condition by calculating an upper limit on the 
evidence that can be determined from the set of current 'live' points. This limit is calculated by 
assuming that all the 'live' points have the likelihood value equal to the maximum-likelihood 
£max and so the largest evidence contribution that can be made by the remaining portion of 
the posterior is AZj = £„iax^i. >Cniax can be taken to be the maximum likelihood value 
amongst the 'live' points. At the beginning of the nested sampling algorithm, it is possible for 
the high likelihood regions of the posterior to not have been explored adequately resulting in 
the maximum-likelihood value £,naxi amongst the 'live' points being a lot smaller than the 
true maximum-likelihood value. The remaining prior volume Xi on the other hand would 
be quite high and therefore the largest evidence contribution AZj = CmaxXi would be 
high as well allowing the algorithm to proceed. At subsequent stages, the remaining prior 



volume Xi decreases exponentially as given by (Eq. 10 1 and the high likelihood regions of 
the posteriors are explored in greater detail with estimated £max getting closer and closer to 
the true maximum-likelihood value. The largest evidence contribution AZi = £niax-'^i, thus 
provides a robust stopping criteria. 

We choose to stop when this quantity would no longer change the final evidence estimate 
by 0.5 in log-evidence. 

3.3. Posterior Inferences 

Although the nested sampling algorithm has been designed to calculate the evidence value, 
accurate posterior inferences can also be generated as a by-product. Once the evidence Z is 
found, each one of the final 'live' points and the discarded points from the nested sampling 
process, i.e., the points with the lowest likelihood value at each iteration i of the algorithm is 
simply assigned the probability weight 

Pi = ■ (11) 

These samples can then be used to calculate inferences of posterior parameters such as 
means, standard deviations, covariances and so on, or to construct marginalised posterior 
distributions. 



3.4. MultiNest Algorithm 

The most challenging task in implementing the nested sampling algorithm is drawing samples 
from the prior within the hard constraint C > Ci at each iteration i. Employing a 
naive approach that draws blindly from the prior would result in a steady decrease in the 
acceptance rate of new samples with decreasing prior volume (and increasing likelihood). The 
MultiNest algorithm [9, 10| tackles this problem through an ellipsoidal rejection sampling 
scheme by enclosing the 'live' point set within a set of (possibly overlapping) ellipsoids 
and a new point is then drawn uniformly from the region enclosed by these ellipsoids. The 
number of points in an individual ellipsoid and the total number of ellipsoids is decided by 
an 'expectation-maximization' algorithm so that the total sampling volume, which is equal 
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(a) (b) 

Figure 2. Illustrations of the ellipsoidal decompositions performed by MultiNest. The 
points given as input are overlaid on the resulting ellipsoids. 1000 points were sampled 
uniformly from: (a) two non-intersecting ellipsoids; and (b) a torus. 



to the sum of volumes of the ellipsoids, is minimized. This allows maximum flexibility and 
efficiency by breaking up a mode resembling a Gaussian into a relatively small number of 
ellipsoids, and if the posterior mode possesses a pronounced curving degeneracy so that it 
more closely resembles a (multi-dimensional) 'banana' then it is broken into a relatively 
large number of small 'overlapping' ellipsoids (see Fig. [2]i. With enough 'live' points, this 
approach allows the detection of all the modes simultaneously, resulting in typically two 
orders of magnitude improvement in efficiency and accuracy over standard methods for 
inference problems in cosmology and particle physics phenomenology (see, for example. 

The elUpsoidal decomposition scheme described above also provides a mechanism for 
mode identification. By forming chains of overlapping ellipsoids (enclosing the 'live' points), 
the algorithm can identify distinct modes with distinct ellipsoidal chains, e.g., in Fig.|2]panel 
(a) the algorithm identifies two distinct modes while in panel (b) the algorithm identifies only 
one mode as all the ellipsoids are linked with each other because of the overlap between them. 
Once distinct modes have been identifed, they are evolved independently. 

Another feature of the MultiNest algorithm is the evaluation of the global as well as 
the 'local' evidence values associated with each mode. These evidence values can be used 
in calculating the probability that an identified 'local' peak in the posterior corresponds to a 
real object (see, for instance, 1.12., 13.. MJ ). We defer the discussion of quantifying the SMBH 
detection to a later work. 

4. Problem Description 

4.1. Waveform model 

The waveform from a binary composed of two non-spinning black holes depends on nine 
parameters: A — {ln{Mc) ,hi{fi) , , ip, In(ic), t, (fc, ln(£'L), i^}, where A/^ is the chirp mass, 
H is the reduced-mass, {9, </>) are the sky location of the source, tc is the time-to-coalescence, 
L is the inclination of the orbit of the binary, (pc is the phase of the GW at coalescence, Dl 
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is the luminosity distance and ip is the polarization of the GW measured in the Solar System 
barycentre (SSB). t/j is the angle between the principal polarisation axes of the source and the 
"natural" polarisation axes to use in the SSB, which are perpendicular to the line of sight to 
the source and the orbital angular momentum of the binary. We model the waveform using the 
restricted post-Newtonian approximation, in which the two polarizations of the GW are |29 1 
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where m = mi + m2 is the total mass of the binary, 77 — mim2/m^ is the reduced mass 
ratio, G is Newton's constant and c is the speed of light. The chirp mass and reduced mass 
are given in terms of m and 7] as = mrf'/^ and ^ = mrj. The function u) = d^orb/dt is 
the orbital frequency, and ^ = ipc — f{t) = 2$or6 is the gravitational wave phase. We take 
these to 2PN order 
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5Gto 

The above model describes the inspiral only and not the merger or ringdown phase. To avoid 
artifacts when taking Fourier transforms, we follow the approach introduced in ifTSl and now 
used in the MLDC |2| and continue the inspiral until the orbit reaches the innermost stable 
circular orbit at 6Af , but employ a hyperbolic taper from 7M to ensure the waveform goes 
smoothly to zero as the end of the inspiral approaches. The taper has the value of 1 for 
r > 7M and then smoothly goes to zero at r = 6A/, where the waveform finishes. To 
implement the LISA response, we use the low-frequency approximation, as described in |30|. 
In the low-frequency limit we expect there to be bright modes in the likelihood at both the 
true sky position and at a position approximately antipodal to it, as mentioned earlier The 
antipodal position corresponds to the shift O^tt — 9,(()^(f) + Ti. The 'antipodal' mode 
of the likelihood is strictly antipodal at low frequencies, but moves as the gravitational wave 
frequency increases. 

The four parameters "y^ci ^} are extrinsic parameters which only affect how the 

gravitational waveform phase at the detector projects into a detector response. It is possible to 
search over these automatically using a generalisation of the JF-statistic [311 . Details of how 
the jF-statistic is computed for non-spinning SMBH binaries may be found in ifTSl . We make 
use of the JF-statistic maximization in the first stage of our search. 

A gravitational wave search is sensitive to redshifted masses, fhi = (1 + z)mi, rather 
than the intrinsic masses. All our subsequent results will be quoted for redshifted mass 
quantities, which we will denote with an overbar for clarity. 
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4.2. Likelihood evaluation 



The gravitational waveform signals can be thought of as occupying a vector space, on which 
there is a natural scalar product ||32ll33l 

df 



{h\s)=2 



Snif) 



Hf)s*{f) + h*{mf) 



where 



h{f) = / dth{t)e^'''f' 



(17) 



(18) 



is the Fourier transform of the time domain waveform h{t). The quantity S'„(/) is the one- 
sided noise spectral density of the detector. For a family of sources with waveforms h{t; A) 
that depend on parameters A, the output of the detector, s{t) — h{t; Ao) + n{t), consists of 
the true signal h{t; Aq) and a particular realisation of the noise, n{t). Assuming that the noise 
is stationary and Gaussian, the logarithm of the likelihood that the parameter values are given 
by A is 



log/: (a) 



= C-(s-h 



-h{\ )/2, 



(19) 



and it is this log-likelihood that is evaluated at each 'live' point in the search. The constant, 
C, depends on the dimensionality of the search space, but its value is not important as we are 
only interested in the relative likelihoods of different points. 

In section |5] we will compare the errors in our recovered parameters to the theoretical 
noise-induced errors. The latter can be computed as cji = ^(r-i)" where 



r ■ — 



dh 



dh 



(20) 



is the Fisher Information Matrix (FIM). 



4.3. Noise model 



The noise consists of an instrumental part, which we take from ll34l 
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where L = 5 x lO^ km is the arm-length for LISA, ST'l/) = 4 x 10^22 /Hz and 
^n^'^if) = 9x lO^''" m^/s^/Hz are the position and acceleration noise respectively. 
has units of Hz^. The quantity = 1/{2ttL) is the mean transfer frequency for the LISA 
arms. In addition, there is a noise contribution from the confusion foreground of galactic 
compact binaries. We represent this using the model of ll35l[36ll 
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mi/M© 


1712 /Mq 


tc/yrs 


e 





z 








1 1 X 10^ 

2 4 X 10*^ 


1 X 10^ 
1 X 10^ 


0.90 
1.02 


0.6283 
2.1206 


4.7124 
3.9429 


1.0 
0.8 


1.1120 
0.6565 


1.2330 
1.0646 


2.2220 
1.5116 



Table 1. Parameter values for the two SMBHBs considered in this study. The dual-TDl 
channel SNRs for the sources are 200 and 131 respectively. 



The total noise is the sum of these contributions. At low frequencies, the LISA detector can 
be thought to consist of two independent right-angle interferometers with independent noise 
described by the same power spectral density. The total likelihood is obtained by summing 
the scalar product ( 17 1 over the two detectors, and the FIM of the network is similarly given 
by the sum of the two FIMs. 



5. Results of MultiNest Search 



To test the MultiNest algorithm we generated a data set containing two SMBH binaries, 
one of which coalesced during the observation time, and one of which coalesced a few days 
after the end of the observation. We took these sources to have the same parameters as the 
two sources searched for by the evolutionary algorithm in fSl to facilitate direct comparison 
between the two techniques. The parameters of the two sources are summarised in Table [T] 
We used a standard flat WMAP cosmology, with radiation, matter and dark energy densities 
{nR,nM,i^A) = (4.9 X 10-^ 0.27, 0.73) and iJo=71 km/s/Mpc |f37^. The redshifts of the 
sources therefore correspond to luminosity distances of 6.634Gpc and 5.024Gpc respectively. 
The redshifted chirp mass and reduced mass are (Af^, m) = (4.9289 x 10^, 1.8182 x 10^) Mq 
for source 1, and (2.997 x 10^, 1.44 x 10^) Mq for source 2, and the full, dual-TDI channel, 
signal-to-noise ratios (SNRs) are 200 and 131 respectively. 

For the first stage of the search, we included the J^-statistic maximization in the 
likelihood evaluation and used MultiNest with 1000 'live' points to search only the five 
dimensional space of intrinsic parameters {rrii,Tfi2,d,<p,tc)- We used wide priors for this 
search of mi, 7712 € [5 x 10^ 3 x 10^] Mq, € [0.85, l.l]yrs, 9 e [0,7r] and cj) e [0,27r]. 
In general, the number of 'live' points required for a search is problem specific, depending 
on the dimensionality of the search space, and on the complexity of the likelihood surface. 
Therefore, the ideal number must be found by trial and error Using too few 'live' points leads 
to under-sampling of the posterior. It is normally clear from the posterior distributions whether 
the likelihood has been properly sampled or not. Using more 'live' points than necessary does 
not affect the results, but is computationally more expensive. The choice of 1000 in this 
case was based on previous experience in other contexts, but appeared to work well. The 
algorithm returned 1 1 possible modes and, based on the time of coalesence, it was clear that 
five were associated with the coalescing source, and six with the non-coalescing source. Four 
modes were the same secondaries found using the evolutionary algorithm |8|, each of which 
corresponded to mass and spin values close to those of one of the sources, but had sky position 
in the vicinity of one of the two sky solutions of the other source. Of the three additional 
secondaries identified in the likelihood, one was associated with the coalescing source, and 
was very close to the true values of the masses and tc, but was also in the wrong position on the 
sky. The other two secondaries were associated with the non-coalescing source and had mass 
and tc parameters that were relatively close to the true values, although many Fisher Matrix 
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a away, but had incorrect sky locations. It is likely that all of these secondaries correspond 
to parameter space points for which the waveforms match the brighter waveform cycles that 
come toward the end of the inspiral, but not the earlier part of the waveform. This stage of the 
search took approximately one and a half hours to run on a single 3.0 GHz Intel Woodcrest 
processor, and used 170, 000 likelihood evaluations. 

In Table [2] we list the mean and standard deviation of the posteriors recovered by the 
algorithm in this stage, for the two true and the two antipodal modes of the sources. In Table[3] 
we assess the error in parameter recovery as a multiple of the theoretical error, computed from 
the Fisher matrix. This is one of the standard approaches used to assess entries to the LISA 
Mock Data Challenge and so we include it for easier comparison to other algorithms discussed 
in the literature. The posterior distributions are not Gaussian, which is assumed for the Fisher 
matrix estimate, so we do not necessarily expect these to be an accurate indication of the 
true error In practice, we will estimate the uncertainty in the parameter estimation from 
the recovered posterior and, in all cases, the true solution lies within l-cr of the peak of the 
recovered posterior, when a is computed from the posterior in that way. We quote results in 
Table [3] for the two true solutions, and for the two antipodal sky solutions. Errors are quoted 
for the redshifted chirp mass, Mc, and reduced mass, /l, rather than riii and m2, since the 
former are used more conventionally in the literature and this therefore facilitates comparison 
to other work, in particular the evolutionary algorithm |8 1. 

We see that the algorithm recovered all of the intrinsic parameters to extremely high 
precision, being within Icr of the true parameters for all but the antipodal sky solution of the 
second source. As mentioned earlier, the 'antipodal' solution is no longer exactly antipodal 
on the sky at higher frequencies. Thus, the apparently larger errors in the parameters for the 
antipodal solution do not mean that the algorithm failed to find the true peak of the likelihood, 
but merely that the secondary is shifted from the precisely antipodal position. These results 
can be directly compared to results obtained using the evolutionary algorithm in 18J. The 
MultiNest results are slightly better than those of the evolutionary algorithm, although in 
both cases the results were within the error we would expect from noise fluctuations and so 
this difference could easily be due to a difference in the noise realisation used to generate 
the data. We note that the evolutionary algorithm also recovered parameters for the antipodal 
solution of the non-coalescing source that were about 4(7 from the true parameters, which is 
consistent with the prior explanation for that difference. Presently, the MultiNest algorithm 
runs about five to ten times faster than the evolutionary algorithm, but the latter has not yet 
been optimized and further work is needed to understand how the efficiency and speed of both 
algorithms will scale when they are applied to other types of gravitational wave source. 

The fact that the intrinsic parameters of the coalescing source are all so well determined 
is due to a combination of the particular noise realisation used and the strong correlations 
between the intrinsic parameters. The correlations ensure that if one of the parameters is well- 
determined then all of the parameters will be well-determined. The results seem surprising 
because the error is so low in all of the parameters, which, if the parameters were uncorrected, 
would require an usually low error in five independent variables. However, in reality we only 
require an unusually low error in one parameter and then correlations force all the errors to 
be small. Over many noise realisations we would expect the errors in the intrinsic parameters 
to wander over the range shown in the posteriors, but this wandering would be correlated 
between the different intrinsic parameters. 

For the second stage of the search, we ran MultiNest on the full nine-dimensional 
parameter space. We ran a separate analysis, each with 500 'live' points, in the vicinity of 
each of the modes identified during the first stage of the search, taking priors on the intrinsic 
parameters that were A £ [A— 5fT, A+5(t], where A and a were the mean and standard deviation 
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0.6269 ± 0.0047 
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2.1267 ±0.0060 


3.9480 ± 0.0076 
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1.0156 ± 0.0059 


0.8076 ±0.0077 



Table 2. Parameter inferences derived from the first, jF-statistic, stage of the search. The 
quoted uncertainties are the one sigma errors from the posterior recovered by MultiNest. 
We list the parameter inferences for both the true and antipodal ('A') sky solutions for each 
source. 
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Table 3. Errors in the intrinsic parameter estimation from the first stage of the search. We 
quote errors for both the true and antipodal ('A') sky solutions for each source. The first row 
for each solution gives the one sigma estimation from the Fisher matrix, o-piM-, while the 
second row gives AA = |(At — Xr) / c!yim\, where \t denotes the true parameter value, 
and Aj; denotes the recovered value of the parameter. 



in that parameter as estimated from the posterior recovered in the first stage of the search. In 
addition, we took natural priors on the four extrinsic parameters — t e [0,7r], i/ic G [0,7r], 
V' G [0,7r] and € [2, 15]Gpc. If the population of supermassive black hole mergers 
was expected to be uniformly distributed in space, then the natural prior on Z?^ would be 
proportional to D\. However, in practice the distribution of events will trace the hierarchical 
assembly of structure and will therefore be complicated and very model-dependent. For that 
reason, we preferred to use a minimal prior, i.e., a flat prior. This is consistent with the 
approach used for the MLDC, in which the source SNRs are drawn from a flat prior |2|. The 
prior on (p^ is [0, tt] rather than [0, 2ti\ since there is an angle degeneracy in the extrinsic 
subspace corresponding to the shift ^ ^ -0 + ''''/2, 0c ^ 0c + ti"- 

In the second stage of the search, only one mode of the likelihood was found within 
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the tight priors on the intrinsic parameters provided by the first stage. The second stage of 
the search required ~ 50, 000 likelihood evaluations and took ~ 30 minutes on a single 3.0 
GHz Intel Woodcrest processor for each mode. The marginalized posterior distributions for 
the true sky locations of the bright and faint sources are shown in Figs.|3]and |4]respectively. 
The posteriors for the faint source are very broad in the extrinsic parameters. This is to 
be expected, as we do not observe the coalesence for this source and so the error in the 
estimate of the phase at coalesence, (pc, is very large, and propagates into other parameters 
due to correlations. In Table |4] we quote the values of the extrinsic parameters recovered 
by the search for the two true modes and the two antipodal modes, and also quote errors as 
multiples of the one sigma Fisher Matrix error estimate as before. The antipodal solutions 
have different values for t and V' as well, which are obtained by the transformation t — > tt — i, 
tp ^ TT — ij} ll38l . so the true values of the extrinsic parameters at the antipodal sky 
location are {L,ip,0,(l3) = (2.0296,1.9086,2.5133,1.5708) for the coalescing source,"!", 
and (2.4851, 0.5062, 1.0210, 0.8013) for the non-coalescing source, "2". We do not include 
the intrinsic parameters in this table as their values did not change significantly in the second 
stage. The posteriors shown in Figs.|3]^were computed during the second search stage for all 
parameters. We see that the algorithm recovers the extrinsic parameters for both sources and 
both sky solutions to the same high precision as the intrinsic parameters. The posteriors are 
highly non-Gaussian for the extrinsic parameters of the non-coalescing source in particular, so 
the Fisher matrix error should not be trustworthy. However, it appears to be giving predictions 
that are largely consistent with the recovered posteriors. 

The MultiNest algorithm also returns the evidence associated with each of the 
recovered modes. These log-evidences were 19,948.5 and 19,948.2 for the true and antipodal 
modes of the coalescing source and 8,551.3 and 8,555.1 for the true and antipodal modes of 
the non-coalescing source. These evidence values do not give very strong reason to favour one 
of the two sky-position modes over the other, but the sky position degeneracy is nearly perfect 
for the sources we are considering and therefore we would not necessarily expect to be able 
to distinguish them. However, the evidences of the 7 other modes identified by MultiNest 
were several hundred lower than the evidences of the true and antipodal modes of the source 
to which they corresponded. There would therefore be good reason to favour the true and 
antipodal modes over the other modes in the likelihood. If we were using full TDI waveforms 
at higher frequency we would expect to be able to distinguish between the true and antipodal 
solutions, and this should be reflected in the evidence values. More investigation is needed 
to fully understand the utility of evidence for characterisation of modes in the likelihood for 
these and other gravitational wave sources. 

6. Discussion 

We have described the use of a multi-modal nested sampling algorithm, MultiNest, as 
a tool for gravitational wave data analysis, illustrating the algorithm by searching for the 
signals from non-spinning supermassive black hole binaries in simulated LISA data. We used 
MultiNest to search a data stream containing two such mergers, and the algorithm was 
able to simultaneously and successfully identify both the true sky position of each source, and 
the antipodal sky solution, plus several other secondary modes of the likelihood surface. In 
a first stage search using the JF-statistic, the algorithm is able to recover the intrinsic source 
parameters to very high precision, within 2(t of the true answer as measured by the theoretical 
noise-induced error computed from the Fisher Matrix. Following up with a search on the full 
physical, nine-dimensional, parameter space, the search is also able to recover the extrinsic 
parameters to similar precision. In addition, the algorithm naturally recovers the posterior 



WLuljiNest for gravitational wave data analysis 



14 



O.EZ a^A 



^7 ^,7£ *.7^ 



□ ,E2 n.GA -I 7 A.7Z 4.74 D.9 £3.3 D.5 



<m:2<m 



0.82 Oei 'I? (1.72 t7A B.9 03 C 9 1 995 2 £ OOS 



□ EE 4.7 4.74 0.5 03 C.B 1.935 2 2.H 



I 2 E.DOE 



ii,7 ^,7£ t.7t D.a- 09 09 2 2.005 ^.9S5 2 2.0O5 



O.EE D.'S 



D.9 03 D.3 1.395 2 2. DOS 1.^35 Z 2. DOS 



2 2.5 



092 l}«4 



5 C& 19BS 2 £C<H ■ ^SS 2 2 005 



n.s 0.^ D.Q 



Figure 3. Marginalized ID and 2D posteriors for tlie primary mode of tlie bright, coalescing 
source (source 1) determined in the second stage analysis using the full likelihood with priors 
on the intrinsic parameters derived from the posteriors recovered in the first, jF-statistic, 
stage. Each plot shows the 2D posterior over the row (vertical axis) and column (horizontal 
axis) parameters. At the top of each column is the ID posterior for that column parameter 
The parameters and the parameter ranges shown in the plots, from top to bottom or left 
to right, are 0.63 < 9 < 0.64, 4.69 < < 4.74, 0.899992 < ic < 0.900008, 
1.993 X lO'^ Mq < mi < 2.007 x IO^Mq, 1.993 x IO^Mq < m2 < 2.007 x IO^Mq, 
0.9 < t < 1.2, 2 < <pc < S and 1 <■(/>< 3 respectively. The true parameter values are 
shown by the blue vertical lines and red crosses overlayed on the ID and 2D marginalized 
posterior plots respectively. 



MuljiNest for gravitational wave data analysis 



15 



a.M 2. IE Z.M 



211 £1? ?13 399 



1 

^ f.D133 



a.M Z.\2 £.13 



3.94 3.3U I.C^^^l 0139 I 02 



2.1 3 £.12 213 3 91 



1 2. IS S.13 ■i.SA \.C•ii^^■'3\SB 1.02 



m 



2.11 £12 33.1 :J3fi iCi^^l-SiS* i 02 



a.M 2.1^ a:13 



3.34 



m m m m 

I.[:1331<]I3S< L.oa 

t 



£11 £12 2 iJ 



4 - 



m « iF^ « 



[Mi 



E.1I 2.12 £13 



I.C'd?1'3IRe ■,.'32 

I,- y\(i.} 



i S.i E 



Figure 4. As Figure|3] but now for the faint, non-coalescing source (source 2). Tlie parameter 
ranges shown in the plots, from top to bottom or left to right, are 2.11 < 9 < 2.14, 
3.93 < < 3.97, 1.01988 < tc < 1.02004, 7.1 x 10"^ Mq < rfii < 7.5 x 10*^ Mq, 
1.74 X IO'^Mq < m2 < 1.81 x IO'^A/q, < t < 0.8, < </)c < 27r and < V < 
respectively. 
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Table 4. Maximum a posteriori values of the extrinsic parameters found in tlie second stage 
of the search. We quote results for both the two true and the two antipodal sky solutions, as 
for the intrinsic parameters. For each solution, the first row indicates the recovered value of 
the parameter, A^, the second row lists the one sigma error estimate from the Fisher Matrix, 
(Tpj]y[, and the third row gives AA = | (At — Xr) /ctpimI, where At is the true value of the 
parameter. 



distributions and the Bayesian evidence associated with each mode of the solution. 

The algorithm works extremely quickly, taking about two and a half hours on a single 
CPU in this case to complete the whole search. MultiNest is fully parallel, which reduces 
the run time for the whole search to ^ 20 minutes when using 10 processors. More 
importantly, it achieves these results without using any specific knowledge of the signals 
for which it is searching — the waveform model enters only in the likelihood evaluation and 
not in determining how the 'live' point set is updated. While many effective algorithms for 
searching for non-spinning SMBH binaries are akeady known, these sources have a multi- 
modal likelihood which provides a good test case with which to scope out the effectiveness 
of this algorithm for gravitational wave data analysis. The real power of this algorithm is in 
its ability to simultaneously identify and characterize all modes of a multi-modal likelihood 
surface. For signals from EMRIs, the presence of many secondaries in the likelihood has 
caused problems in data analysis ID U E |6l, and this is likely to be true in searches for 
spinning black hole binaries as well. MultiNest is perfectly adapted to tackling these types 
of problem, and so these are the next challenges for the algorithm. Research is needed to 
investigate how the run time, number of 'Uve' points needed etc. will scale for these other 
problems. However, the fact that the algorithm performed so well in the non-spinning black 
hole case, without using any knowledge of the properties of the waveforms, is reason to expect 
MultiNest will also perform strongly in other gravitational wave data analysis applications, 
for both space-based and ground-based detectors. 

A potentially even more important application of the algorithm in the context of 
gravitational-wave data analysis will be for evidence evaluation. MultiNest computes ev- 
idence values as it progresses, which can be used for model selection, e.g., to determine the 
number of a particular type of source that are present in a data stream or to test alternative 
theories of gravity etc. Model selection questions have been explored in a LISA context using 
other algorithms, including MCMC |24| and nested sampling [l^J. The speed and efficiency 
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of MultiNest in answering similar questions should be explored in the future, in order to 
compare and contrast to these other techniques. The results described in this paper suggest 
that MultiNest will also perform extremely well in such a model evaluation context. 
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